Optimizing Shared Bicycle Systems for Improved Urban Mobility and Resource Allocation: Exploratory and Spatial Analysis

1.0 Introduction¶

This study addresses the optimization of shared bicycle systems to enhance urban mobility and resource allocation. As cities face increasing congestion and environmental challenges, shared bicycle systems have emerged as a sustainable solution to improve transportation efficiency (Karanikola et al., 2018). However, operational inefficiencies, such as suboptimal bicycle distribution and maintenance, hinder their full potential. This research explores how spatial and exploratory data analysis can be used to identify these inefficiencies and propose solutions. The project is highly relevant to urban data science as it utilizes spatial data analysis to improve transport systems, contributing to smarter, more sustainable urban planning. Furthermore, the study bridges gaps in academic literature regarding the optimization of shared mobility systems and provides practical insights that can be applied in urban transportation networks. The findings will offer valuable implications for both academic research and urban mobility planners seeking to enhance the functionality of shared bicycle systems.

2.0 Research Question and Objectives¶

This research aims to optimize shared bicycle systems to enhance urban mobility and resource allocation. The study identifies inefficiencies in bicycle distribution and maintenance and proposes data-driven solutions. The key research question guiding this project is:

2.1 Research Question¶

How can shared bicycle systems be optimized to improve the efficiency of urban travel chains and resource allocation?

This question is explored through spatial and exploratory data analysis, to enhance the functionality of shared bicycle systems. The study’s scope includes the examination of operational practices and the improvement of bicycle-sharing efficiency.

2.1 Research Objectives¶

  1. To understand the usage pattern of the shared bicycle clients.
  2. To Identify geographic patterns in bike usage and distribution of stations.
  3. To group stations or trips based on usage patterns.
  4. To assess the operational efficiency of the bike system.

3.0 Methodology¶

In [1]:
# importing libraries
import pandas as pd
import os
import folium
from folium.plugins import HeatMap
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from sklearn.preprocessing import StandardScaler
from scipy.stats import chi2_contingency
from IPython.display import display
from shapely.geometry import Point
import contextily as ctx

3.1 Datasets¶

This analysis utilizes two primary datasets from the Metro Bike Share system, which provides extensive information on bike share and station details within the city. The first dataset is the Metro Bike Share Trips Dataset. This dataset captures detailed records of individual bike trips, with the variables: trip ID, duration, start and end times, start and end stations, geographical coordinates (latitude and longitude), bike ID, plan duration, trip route category, passholder type, and bike type. The data was collected through GPS tracking and user interactions, ensuring accurate recording of trip-specific details. This dataset is essential for analyzing bike usage patterns, identifying trends in trip durations, and evaluating the impact of different bike types and passholder categories on bike-sharing behaviour.

In [2]:
# importing the trip and station datasets
trip_data = pd.read_csv("metro-trips-2024-q1.csv", encoding='latin1')
# Define latitude and longitude bounds for Los Angeles
lat_min, lat_max = 33.7, 34.3
lon_min, lon_max = -118.7, -118.1

# Filter the data based on latitude and longitude bounds
trip_data = trip_data[
    (trip_data['start_lat'].between(lat_min, lat_max)) &
    (trip_data['start_lon'].between(lon_min, lon_max)) &
    (trip_data['end_lat'].between(lat_min, lat_max)) &
    (trip_data['end_lon'].between(lon_min, lon_max))
]

# Display the cleaned data
print("\nTrip Data of shared Bicycles")
display(trip_data.head(5))
Trip Data of shared Bicycles
trip_id duration start_time end_time start_station start_lat start_lon end_station end_lat end_lon bike_id plan_duration trip_route_category passholder_type bike_type
0 341828178 87 1/1/2024 0:10 1/1/2024 1:37 4515 34.039742 -118.442268 4564 34.035351 -118.434143 24169 30 One Way Monthly Pass electric
1 341816845 6 1/1/2024 0:12 1/1/2024 0:18 4602 34.164951 -118.363632 4603 34.152142 -118.361954 15430 1 One Way Walk-up standard
2 341817147 20 1/1/2024 0:15 1/1/2024 0:35 3064 34.046131 -118.257591 3081 34.031891 -118.250183 5913 30 One Way Monthly Pass standard
3 341817198 16 1/1/2024 0:22 1/1/2024 0:38 4543 33.957180 -118.451248 4583 33.976189 -118.418419 6132 1 One Way Walk-up standard
4 341816990 5 1/1/2024 0:23 1/1/2024 0:28 4518 34.057968 -118.299751 4587 34.060791 -118.309067 29601 365 One Way Annual Pass electric

The second dataset is the Metro Bike Share Stations Dataset, which provides information on bike-sharing stations. It includes station IDs, names, operational dates, regions, status, and geographical coordinates. This dataset offers insights into the locations and operational status of the stations, which is crucial for understanding the spatial distribution and accessibility of the bike-sharing system. By mapping these stations, we can assess station density, accessibility, and their proximity to major city landmarks. Both datasets were collected from the Metro Bike Share system’s records. The trip data was obtained through real-time tracking of bike trips, while the station data was compiled from operational records of bike stations. To integrate these datasets, station IDs were used to link trip records with their respective start and end locations. This integration enables a comprehensive analysis of bike trips in relation to station locations, facilitating the evaluation of station efficiency and trip patterns. Additionally, secondary datasets, including information on subway stations, bus stops, and railway stops. These additional data sources provide valuable context on transportation infrastructure allowing for a more nuanced understanding of bike usage patterns and accessibility.

In [3]:
station_data = pd.read_csv("metro-bike-share-stations-2024-04-01 (4).csv", encoding='latin1')
# Define latitude and longitude bounds for Los Angeles
lat_min, lat_max = 33.7, 34.3
lon_min, lon_max = -118.7, -118.1

# Filter the data based on latitude and longitude bounds
station_data = station_data[
    (station_data['Latitude'].between(lat_min, lat_max)) &
    (station_data['Longitude'].between(lon_min, lon_max))
]

# Display the cleaned data
print("\nStation Data")
display(station_data.head(5))
Station Data
Station_ID Station_Name Day of Go_live_date Region Status Latitude Longitude
1 3005 7th & Flower 07/07/2016 DTLA Active 34.048500 -118.258537
2 3006 Olive & 8th 07/07/2016 DTLA Active 34.045540 -118.256668
3 3007 5th & Grand 07/07/2016 DTLA Active 34.050480 -118.254593
4 3008 Figueroa & 9th 07/07/2016 DTLA Active 34.046612 -118.262733
5 3010 11th & Maple 07/10/2016 DTLA Active 34.037048 -118.254868

The rail station shapefile contains spatial data for all railway stations across various lines. It includes details such as station locations, geometries, and identifiers. This dataset allows for the integration of railway infrastructure into spatial analyses, enhancing the understanding of how bike-sharing systems interact with public transportation networks.

In [4]:
# Define the base path to the shapefiles (without extensions)
shapefile_path = 'C:/Users/User/Downloads/230711_All_MetroRail_Stations'
files = ['.shp', '.shx', '.dbf', '.prj']

# Check if each file exists
for file_extension in files:
    file_path = shapefile_path + file_extension
    if not os.path.exists(file_path):
        print(f"Missing file: {file_path}")
    else:
        print(f"File exists: {file_path}")

# Define the path to the shapefile
shapefile_path = r'C:/Users/User/Downloads/230711_All_MetroRail_Stations.shp'

# Check if the shapefile exists
if not os.path.exists(shapefile_path):
    raise FileNotFoundError(f"The shapefile does not exist at the specified path: {shapefile_path}")

gdf_shapefile = gpd.read_file(shapefile_path)
# Define the latitude and longitude bounds for Los Angeles
lat_min, lat_max = 33.7, 34.3
lon_min, lon_max = -118.7, -118.1

# Filter the GeoDataFrame to include only rows within the specified bounds
gdf_shapefile = gdf_shapefile[
    (gdf_shapefile.geometry.y >= lat_min) & 
    (gdf_shapefile.geometry.y <= lat_max) &
    (gdf_shapefile.geometry.x >= lon_min) & 
    (gdf_shapefile.geometry.x <= lon_max)
]

# Display the filtered data
print("\nFiltered Metro Rail Stations Data within Los Angeles Bounds")
print(gdf_shapefile.head())
File exists: C:/Users/User/Downloads/230711_All_MetroRail_Stations.shp
File exists: C:/Users/User/Downloads/230711_All_MetroRail_Stations.shx
File exists: C:/Users/User/Downloads/230711_All_MetroRail_Stations.dbf
File exists: C:/Users/User/Downloads/230711_All_MetroRail_Stations.prj

Filtered Metro Rail Stations Data within Los Angeles Bounds
  STOP_ID                    STOP_NAME   STOP_LAT    STOP_LON  \
0   80101  Downtown Long Beach Station  33.768071 -118.192921   
1   80102          Pacific Ave Station  33.772258 -118.193700   
2   80105       Anaheim Street Station  33.781830 -118.189384   
3   80106    Pacific Coast Hwy Station  33.789090 -118.189382   
4   80107        Willow Street Station  33.807079 -118.189834   

                      geometry  
0  POINT (-118.19292 33.76807)  
1  POINT (-118.19370 33.77226)  
2  POINT (-118.18938 33.78183)  
3  POINT (-118.18938 33.78909)  
4  POINT (-118.18983 33.80708)  

The Lines Serving Stops shapefile contains individual points representing bus stops and the specific bus lines that serve each stop. This dataset is crucial for analyzing the accessibility of bus stops and understanding how different bus routes intersect with bike-sharing systems. It provides valuable insights into public transportation connectivity and its impact on urban mobility.

In [5]:
# Define the path to the shapefile directory and the file extensions
shape_path = 'C:/Users/User/Downloads/LineServingStops0624'
files = ['.shp', '.shx', '.dbf', '.prj']
shapefile_path = shape_path + files[0]  # The .shp file is the main file

# Read the shapefile into a GeoDataFrame
bus_serving_lines_gdf = gpd.read_file(shapefile_path)

# Define the latitude and longitude bounds for Los Angeles
lat_min, lat_max = 33.7, 34.3
lon_min, lon_max = -118.7, -118.1

# Filter the GeoDataFrame to include only rows within the specified bounds
bus_serving_lines_gdf = bus_serving_lines_gdf[
    (bus_serving_lines_gdf['LAT'] >= lat_min) & 
    (bus_serving_lines_gdf['LAT'] <= lat_max) &
    (bus_serving_lines_gdf['LONG'] >= lon_min) & 
    (bus_serving_lines_gdf['LONG'] <= lon_max)
]

# Display the filtered data
print("\nBus Serving Lines Data within Los Angeles Bounds")
print(bus_serving_lines_gdf.head())
Bus Serving Lines Data within Los Angeles Bounds
   STOPNUM  LINE DIR                    STOPNAME        LAT        LONG  \
0        1   265   S         Paramount / Slauson  33.973248 -118.113113   
1        3    35   N            Jefferson / 10th  34.025471 -118.328402   
2        6    53   N  120th / Augustus F Hawkins  33.924696 -118.242222   
3        6   120   W  120th / Augustus F Hawkins  33.924696 -118.242222   
4        6    55   N  120th / Augustus F Hawkins  33.924696 -118.242222   

                      geometry  
0  POINT (-118.11311 33.97325)  
1  POINT (-118.32840 34.02547)  
2  POINT (-118.24222 33.92470)  
3  POINT (-118.24222 33.92470)  
4  POINT (-118.24222 33.92470)  

3.2 Data Cleaning and Wrangling¶

The first step in data cleaning was to drop all the missing values in the trip dataset and stationa data set and conversion of the date/time column to the appropriate date format.

In [6]:
trip_data.isnull().sum()
# Drop rows where any of the specified columns have null values
trip_data = trip_data.dropna(subset=['start_lat', 'start_lon', 'end_lat', 'end_lon'])

# Convert start_time and end_time to datetime
trip_data['start_time'] = pd.to_datetime(trip_data['start_time'], format='%m/%d/%Y %H:%M')
trip_data['end_time'] = pd.to_datetime(trip_data['end_time'], format='%m/%d/%Y %H:%M')

# Add a new column for trip date only (for daily analysis)
trip_data['trip_date'] = trip_data['start_time'].dt.date

Next, using a box plot, the trip_duration column as to inspected for any outlier.

In [7]:
# Create a new column for trip duration in minutes
trip_data['trip_duration_min'] = trip_data['duration'] 
# 1. Summary statistics for trip duration
summary_stats = trip_data['trip_duration_min'].describe()
print("Summary statistics for trip duration (in minutes):")
print(summary_stats)
# Plot a boxplot of trip durations
plt.figure(figsize=(10, 6))
sns.boxplot(x=trip_data['trip_duration_min'])
plt.title('Boxplot of Trip Durations')
plt.xlabel('Trip Duration (minutes)')
plt.show()
Summary statistics for trip duration (in minutes):
count    117660.000000
mean         27.049796
std          73.135829
min           1.000000
25%           7.000000
50%          14.000000
75%          26.000000
max        1440.000000
Name: trip_duration_min, dtype: float64
No description has been provided for this image

The box plot shows the existence of many trip duration outliers that can potentiatially affect the resulsts showing a positive skewness. Removing outliers will focus the analysis on more typical bike-sharing trips, leading to a clearer understanding of regular usage patterns. This exclusion reduces the influence of extreme values that could distort summary statistics and model accuracy. However, it might overlook rare but important cases, such as unusually long trips, which could provide insights into unique user behavior or system inefficiencies. Overall, it enhances data reliability while possibly sacrificing the full spectrum of user scenarios.

In [8]:
# Removing the outliers in the trip duration
# Define the bounds for trip duration
lower_bound = 1 # pratical minimum minutes
upper_bound = 45 # based on IQR and practical considerations

# Filtering out outliers
trip_data = trip_data[(trip_data['trip_duration_min'] >= lower_bound) & (trip_data['trip_duration_min'] <= upper_bound)]
In [9]:
# Plot a boxplot of trip durations
plt.figure(figsize=(10, 6))
sns.boxplot(x=trip_data['trip_duration_min'])
plt.title('Boxplot of Trip Durations')
plt.xlabel('Trip Duration (minutes)')
plt.show()
No description has been provided for this image

The setting of the upper bound to 45 minutes eliminates apparent outliers, is it a reasonable choice when focusing on regular or short-duration trips, which may be more representative of typical bike-sharing behavior. This limit aligns with common commuter or recreational trips in urban environments, especially if trips longer than 45 minutes are rare and may represent exceptional cases (like tourists or service disruptions).

In [10]:
station_data.isnull().sum()
Out[10]:
Station_ID             0
Station_Name           0
Day of Go_live_date    0
Region                 0
Status                 0
Latitude               0
Longitude              0
dtype: int64

3.1 Data Analysis¶

3.1.1 Understanding the Usage Pattern¶

In [11]:
# Bar chart for passholder types
# Use a built-in bright color palette
plt.figure(figsize=(8, 5))
sns.countplot(data=trip_data, x='passholder_type', palette='Set1')
plt.title('Passholder Type Distribution', fontsize=16)
plt.xlabel('Passholder Type', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=45)
plt.show()
C:\Users\User\AppData\Local\Temp\ipykernel_16448\2584185198.py:4: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.countplot(data=trip_data, x='passholder_type', palette='Set1')
No description has been provided for this image

On the passholder distribution, the majority of cyclists hold monthly passes, likely representing regular commuters (Cohen & Shaheen, 2012), potentially middle-class citizens who use bicycles as part of their daily routines. This could also reflect an environmentally conscious demographic, aware of climate change issues and seeking to reduce their carbon footprint through sustainable transport. The lower number of one-day pass users suggests that casual or recreational riders are a smaller portion of the system’s users, possibly tourists or occasional cyclists. This trend highlights a stronger reliance on the bike-sharing system as a practical commuting solution rather than a recreational service.

In [12]:
daily_trips = trip_data.groupby('trip_date').size().reset_index(name='trip_count')
# Calculate the rolling average with a window of 7 days
daily_trips['rolling_avg'] = daily_trips['trip_count'].rolling(window=7).mean()

plt.figure(figsize=(12, 6))

# Plot the daily trip counts
sns.lineplot(data=daily_trips, x='trip_date', y='trip_count', color='dodgerblue', label='Daily Trips')

# Plot the rolling average
sns.lineplot(data=daily_trips, x='trip_date', y='rolling_avg', color='orange', linestyle='--', label='7-Day Rolling Average')

plt.title('Daily Trip Counts and Rolling Average Over Time', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Number of Trips', fontsize=12)
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
In [13]:
# Violin plot to compare trip duration by passholder type
# Custom bright color palette
bright_palette = sns.color_palette(['#FF6F61', '#6B5B95', '#88B04B', '#F7CAC9', '#92A8D1', '#FFCC5C', '#D4A5A5'])

plt.figure(figsize=(10, 6))
sns.violinplot(data=trip_data, x='passholder_type', y='duration', palette=bright_palette)
plt.title('Trip Duration by Passholder Type', fontsize=16)
plt.xlabel('Passholder Type', fontsize=12)
plt.ylabel('Trip Duration (Minutes)', fontsize=12)
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
C:\Users\User\AppData\Local\Temp\ipykernel_16448\3685937012.py:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.violinplot(data=trip_data, x='passholder_type', y='duration', palette=bright_palette)
C:\Users\User\AppData\Local\Temp\ipykernel_16448\3685937012.py:6: UserWarning: The palette list has more values (7) than needed (4), which may not be intended.
  sns.violinplot(data=trip_data, x='passholder_type', y='duration', palette=bright_palette)
No description has been provided for this image

The violin plot illustrates that Annual Pass holders tend to take shorter bike trips, reflecting frequent but brief journeys (Kim, 2023). In contrast, One-Day Pass holders generally use bikes for trips ranging from 10 to 30 minutes, suggesting moderate, single-day use. Walk-Up Pass holders exhibit a broader range of trip durations, indicating higher variability and more extensive use of the bike-sharing system. This variation highlights different usage patterns among pass types and may influence how bike-sharing resources are allocated.

In [14]:
# Set plot style and brighter color palette
sns.set(style="whitegrid")
bright_colors = sns.color_palette("bright")

# Plot the distribution of the 'trip_route_category'
plt.figure(figsize=(8, 6))
sns.countplot(x='trip_route_category', data=trip_data, palette=bright_colors)

# Set plot labels and title
plt.title('Distribution of Trip Route Category', fontsize=16)
plt.xlabel('Trip Route Category', fontsize=12)
plt.ylabel('Count', fontsize=12)

# Show plot
plt.show()
C:\Users\User\AppData\Local\Temp\ipykernel_16448\638446784.py:7: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.countplot(x='trip_route_category', data=trip_data, palette=bright_colors)
C:\Users\User\AppData\Local\Temp\ipykernel_16448\638446784.py:7: UserWarning: The palette list has more values (10) than needed (2), which may not be intended.
  sns.countplot(x='trip_route_category', data=trip_data, palette=bright_colors)
No description has been provided for this image

The analysis of the trip route category reveals that 'One Way' trips dominate the usage patterns, suggesting that many users rely on bike-sharing systems for one-directional commuting or short errands (Willberg, Salonen and Toivonen, 2021). This could indicate a preference for bikes as part of a multi-modal transportation system, where users combine bikes with other forms of transport like buses or trains. The lower frequency of 'Round Trip' journeys implies that fewer users return to their starting point, possibly indicating less reliance on bikes for recreational use or circular trips.

In [15]:
from folium.plugins import HeatMap

# Create a base map
m = folium.Map(location=[trip_data['start_lat'].mean(), trip_data['start_lon'].mean()], zoom_start=12)

# Prepare data for heatmap
heat_data_start = [[row['start_lat'], row['start_lon']] for index, row in trip_data.iterrows()]
heat_data_end = [[row['end_lat'], row['end_lon']] for index, row in trip_data.iterrows()]

# Add HeatMap layers to the base map
HeatMap(heat_data_start, name='Start Locations').add_to(m)
HeatMap(heat_data_end, name='End Locations').add_to(m)

# Add Layer Control
folium.LayerControl().add_to(m)

# Save the map to an HTML file
m.save('heatmap_with_layers.html')
m
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The heat map reveals that bike-sharing activity is heavily concentrated in downtown Los Angeles, indicating a higher density of bike trips in this area. This suggests that downtown is a key hub for bike-sharing due to its accessibility, commercial activity, and transportation links (Chun et al., 2024). The high concentration of bike usage in downtown implies a need for efficient station placement and resource management to cater to the increased demand in this central location.

3.1.2 Geographic patterns in bike usage and distribution of stations¶

In [16]:
# Aggregate trips by start station
trip_counts = trip_data['start_station'].value_counts().reset_index()
trip_counts.columns = ['Station_ID', 'Trip_Count']

# Merge with station data
station_usage = pd.merge(station_data, trip_counts, left_on='Station_ID', right_on='Station_ID', how='left')
station_usage['Trip_Count'].fillna(0, inplace=True)  # Fill NaNs with 0

# Create a base map
m = folium.Map(location=[station_usage['Latitude'].mean(), station_usage['Longitude'].mean()], zoom_start=12)

# Prepare data for heatmap
heat_data = [[row['Latitude'], row['Longitude'], row['Trip_Count']] for index, row in station_usage.iterrows()]

# Add heatmap to the map
HeatMap(heat_data, radius=15, blur=10).add_to(m)

# Add station markers with blue pins
for _, row in station_usage.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"Station: {row['Station_Name']}<br>Trips: {int(row['Trip_Count'])}",
        icon=folium.Icon(color='blue', icon='info-sign')
    ).add_to(m)

# Save or display the map
m.save("heatmap_bike_usage_with_markers.html")
m
C:\Users\User\AppData\Local\Temp\ipykernel_16448\1191692382.py:7: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  station_usage['Trip_Count'].fillna(0, inplace=True)  # Fill NaNs with 0
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The heat map reveals a concentration of bike-sharing activity in areas with significant economic activity, such as the 28th $ University, which records a substantial 702 trips. This indicates that bike-sharing usage is strongly associated with locations that attract high foot traffic and economic engagement (Ricci, 2015). The high trip count in such areas suggests that these locations are key hubs for bike-sharing demand, likely driven by the presence of businesses, educational institutions, and other activities that draw people to these areas.

In [17]:
import folium
import geopandas as gpd
import pandas as pd

# Load rail station data
gdf_shapefile = gpd.read_file('C:/Users/User/Downloads/230711_All_MetroRail_Stations.shp')

# Print CRS and check geometries
print(f"CRS of rail station data: {gdf_shapefile.crs}")
print(f"Sample rail station data:\n{gdf_shapefile.head()}")

# Ensure rail station geometry is in WGS84 CRS
if gdf_shapefile.crs != "EPSG:4326":
    gdf_shapefile = gdf_shapefile.to_crs(epsg=4326)

# Filter out invalid geometries
gdf_shapefile = gdf_shapefile[gdf_shapefile.geometry.notnull()]

# Create a base map centered on the mean location of bike stop stations
m = folium.Map(location=[station_usage['Latitude'].mean(), station_usage['Longitude'].mean()], zoom_start=12)

# Add rail station markers with red pins
for _, row in gdf_shapefile.iterrows():
    # Check if the geometry is a point
    if row.geometry.geom_type == 'Point':
        folium.Marker(
            location=[row.geometry.y, row.geometry.x],
            popup=f"Rail Station: {row['STOP_NAME']}" if 'STOP_NAME' in row else "Unnamed Rail Station",
            icon=folium.Icon(color='red', icon='train')
        ).add_to(m)
    else:
        print(f"Skipping non-point geometry: {row.geometry}")

# Add bike stop station markers with blue pins
for _, row in station_usage.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"Bike Stop Station: {row['Station_Name']}",
        icon=folium.Icon(color='blue', icon='bicycle')
    ).add_to(m)

# Save or display the map
m.save("bike_stop_and_rail_station_distribution.html")
m
CRS of rail station data: EPSG:4326
Sample rail station data:
  STOP_ID                    STOP_NAME   STOP_LAT    STOP_LON  \
0   80101  Downtown Long Beach Station  33.768071 -118.192921   
1   80102          Pacific Ave Station  33.772258 -118.193700   
2   80105       Anaheim Street Station  33.781830 -118.189384   
3   80106    Pacific Coast Hwy Station  33.789090 -118.189382   
4   80107        Willow Street Station  33.807079 -118.189834   

                      geometry  
0  POINT (-118.19292 33.76807)  
1  POINT (-118.19370 33.77226)  
2  POINT (-118.18938 33.78183)  
3  POINT (-118.18938 33.78909)  
4  POINT (-118.18983 33.80708)  
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The observation that bike end stations are clustered near rail stations suggests that users frequently utilize bicycles to reach these transit hubs. This pattern indicates that the bike-sharing system is effectively serving as a connector to rail transportation, facilitating seamless multimodal travel. To enhance system efficiency, bike stations could be strategically expanded or increased near high-traffic rail stations to accommodate higher demand and reduce potential congestion. Additionally, integrating real-time data on bike availability with transit schedules could improve user convenience and ensure that bicycles are available when and where they are needed most. Implementing features like reserved bike slots or dedicated bike lanes near rail stations could further streamline connections and encourage more users to opt for this integrated mode of transportation.

3.1.3 Trips based on the usage pattern¶

3.1.3.1 K-Means Clustering¶

In [18]:
from shapely.geometry import Polygon
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
# Select features for clustering (e.g., Latitude, Longitude, Trip_Count)
features = station_usage[['Latitude', 'Longitude', 'Trip_Count']].values
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
dbscan = DBSCAN(eps=0.5, min_samples=5)  # Adjust parameters based on your data
clusters = dbscan.fit_predict(scaled_features)

# Add cluster labels to the DataFrame
station_usage['Cluster'] = clusters


# Create a GeoDataFrame for clusters
gdf_clusters = gpd.GeoDataFrame(
    station_usage, 
    geometry=gpd.points_from_xy(station_usage['Longitude'], station_usage['Latitude'])
)

# Calculate convex hulls for each cluster
hulls = gdf_clusters.groupby('Cluster').apply(lambda x: Polygon(x.geometry.unary_union.convex_hull))

# Plot
fig, ax = plt.subplots(figsize=(8, 12))
for cluster, hull in hulls.items():
    if hull.is_valid:
        gpd.GeoSeries(hull).plot(ax=ax, alpha=0.5, edgecolor='k')
gdf_clusters.plot(ax=ax, color='red', markersize=50, alpha=0.7, edgecolor='k')
plt.title('Cluster Boundaries with Convex Hulls')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
C:\Users\User\AppData\Local\Temp\ipykernel_16448\555026227.py:23: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  hulls = gdf_clusters.groupby('Cluster').apply(lambda x: Polygon(x.geometry.unary_union.convex_hull))
No description has been provided for this image

The cluster analysis of bike stations highlights two critical areas for optimization. The sparse cluster, located between latitudes 33.7-33.8 and longitudes -118.30 to -118.25, indicates an underserved region with fewer bike stations, suggesting a need for expanded coverage to improve accessibility. Conversely, the dense cluster, spanning latitudes 33.9-34.1 and longitudes -118.45 to -118.25, shows a high concentration of stations, reflecting strong demand. To address these findings, expanding bike station coverage in the sparse area can enhance service availability, while optimizing bike distribution and maintenance in the dense cluster can better accommodate high usage. These data-driven adjustments can significantly improve the efficiency and effectiveness of the bike-sharing system in alignment with urban mobility needs.

3.1.4 Operational Efficiency¶

Spatial analysis of the bike-sharing system in downtown Los Angeles reveals that the majority of bike stations are concentrated in this area, reflecting high usage and demand. This concentration is evident from the heatmaps generated, which highlight significant activity in central urban regions. Proximity analysis indicates that many bike stations are strategically located near major public transport hubs, which aligns with urban planning principles aimed at enhancing multimodal transportation (Chen et al, 2021). However, the spatial distribution also suggests that while downtown is well-served, other areas, particularly those further from transit lines or high-density zones, may experience bike shortages or underutilization. This uneven distribution can impact operational efficiency and service coverage. To optimize the system, it is crucial to balance bike station placements across the city, ensuring that both high-demand and underserved areas are adequately served (Shaheen et al., 2010). Future planning should consider integrating bike stations with public transit routes and identifying emerging high-demand areas to improve overall accessibility and user satisfaction (Fishman, 2016).

4.0 Results¶

The spatial and exploratory analysis of the Metro Bike Share system reveals several insights into its operational efficiency and usage patterns. The distribution of trip durations, after removing outliers, indicates that most trips are relatively short, averaging around 27 minutes, with a few extending up to 45 minutes. This trend aligns with previous findings that urban bike-sharing trips are typically brief, reflecting common commuter and short-distance travel behaviors (Shaheen et al., 2010; Fishman et al., 2013).

Analysis of passholder types shows that Monthly Pass holders dominate, suggesting that regular commuters are the primary users, consistent with studies that link bike-sharing use to regular commuting (Cohen & Shaheen, 2012). One-Day Pass holders and Walk-Up users, who generally exhibit greater variability in trip durations, likely represent less frequent, more recreational use.

The geographical distribution of trips and stations highlights inefficiencies in station placement and coverage. Visualizations reveal that bike stations are concentrated in downtown Los Angeles but are not evenly distributed in high-demand areas, corroborating similar findings from other urban bike-sharing studies (DeMaio, 2009). Integrating these findings with public transport data, such as rail and bus stops, could enhance the bike-sharing system's integration with other modes of transportation, improving overall urban mobility.

5.0 Interpretation¶

In this analysis, we examined the spatial distribution and operational efficiency of bike-sharing systems in downtown Los Angeles, focusing on trip start and end locations, bike station utilization, and proximity to metro rail stations. The clustering of trip start and end points indicated significant activity near key transit hubs, aligning with findings from prior research that highlights the integration of bike-sharing with public transport to enhance urban mobility. However, our study reveals a higher concentration of trips starting and ending in specific recreational zones, diverging from previous studies that observed more uniform distribution patterns (Ma et al., 2020). This suggests that the recreational zones in downtown Los Angeles attract more bike share users, possibly due to the accessibility of leisure destinations.

Furthermore, the clear demarcation of bike stations and rail stations on our heatmaps supports the notion that strategic placement of bike stations near public transit nodes can optimize resource allocation and user convenience (Williams et al., 2023). The observed variations in station utilization across different areas underscore the importance of tailored operational strategies to address local demand patterns, which contrasts with broader, one-size-fits-all approaches often applied in similar studies (Borowska-Stefańska et al, 2021). This interpretation underscores the need for adaptive planning that considers both recreational and functional usage patterns to enhance system efficiency.

6.0 Conclusion¶

This study effectively identifies key areas for optimizing the bike-sharing system’s efficiency through spatial and temporal analyses. The strengths of this research include its comprehensive use of real-world trip and station data to highlight inefficiencies and propose data-driven solutions. However, limitations such as the exclusion of real-time data and the constraints of a specific geographic focus may impact the generalizability of findings. Future research should incorporate dynamic data sources and expand beyond downtown Los Angeles to validate these results. Addressing these limitations, such as incorporating real-time demand forecasting, could further enhance system efficiency. Overall, this study contributes valuable insights into optimizing bike-sharing systems, offering actionable recommendations for urban mobility improvements.

7.0 References¶

  • Fishman, E. (2016). Bikeshare: A Review of Recent Literature. Transport Reviews, 36(1), 92-113.

  • Cheng, L., De Vos, J. and Witlox, F., 2021. Transport modes and sustainability. International encyclopedia of transportation, 5, pp.710-714.

  • Shaheen, S., Martin, E., & Cohen, A. (2010). Public Bikesharing in North America: Early Operator and User Insights. Mineta Transportation Institute Report.

  • Macioszek, E., Świerk, P. and Kurek, A. (2020). The Bike-Sharing System as an Element of Enhancing Sustainable Mobility—A Case Study based on a City in Poland. Sustainability, 12(8), p.3285. doi:https://doi.org/10.3390/su12083285.

  • Ma, X., Ji, Y., Yuan, Y., Van Oort, N., Jin, Y. and Hoogendoorn, S. (2020). A comparison in travel patterns and determinants of user demand between docked and dockless bike-sharing systems using multi-sourced data. Transportation Research Part A: Policy and Practice, 139, pp.148–173. doi:https://doi.org/10.1016/j.tra.2020.06.022.

  • Borowska-Stefańska, M., Mikusova, M., Kowalski, M., Kurzyk, P. and Wiśniewski, S. (2021). Changes in Urban Mobility Related to the Public Bike System with Regard to Weather Conditions and Statutory Retail Restrictions. Remote Sensing, 13(18), p.3597. doi:https://doi.org/10.3390/rs13183597.

  • Cohen, A. P., & Shaheen, S. A. (2012). Planning for Bike Share: A Guide for Practitioners. Transportation Research Board.

  • Kim, K. (2023). Discovering spatiotemporal usage patterns of a bike-sharing system by type of pass: a case study from Seoul. Transportation. doi:https://doi.org/10.1007/s11116-023-10371-7.

  • https://bikeshare.metro.net/wp-content/uploads/2024/04/metro-trips-2024-q1.zip

  • https://bikeshare.metro.net/wp-content/uploads/2024/04/metro-bike-share-stations-2024-04-01.csv

  • https://developer.metro.net/wp-content/uploads/2023/07/230711_All_MetroRail_Stations.zip

  • Willberg, E., Salonen, M. and Toivonen, T. (2021). What do trip data reveal about bike-sharing system users? Journal of Transport Geography, 91, p.102971. doi:https://doi.org/10.1016/j.jtrangeo.2021.102971.

  • Chun, B., Nguyen, A., Pan, Q. and Mirzaaghazadeh, E. (2024). Spatial Analysis of Bike-Sharing Ridership for Sustainable Transportation in Houston, Texas. Sustainability, [online] 16(6), p.2569. doi:https://doi.org/10.3390/su16062569.

  • Ricci, M. (2015). Bike sharing: A review of evidence on impacts and processes of implementation and operation. Research in Transportation Business & Management, 15, pp.28–38. doi:https://doi.org/10.1016/j.rtbm.2015.03.003.

  • Karanikola, P., Panagopoulos, T., Tampakis, S. and Tsantopoulos, G. (2018). Cycling as a Smart and Green Mode of Transport in Small Touristic Cities. Sustainability, [online] 10(1), p.268. doi:https://doi.org/10.3390/su10010268.

In [ ]: